Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning
نویسندگان
چکیده
In reinforcement learning (RL) tasks, an efficient exploration mechanism should be able to encourage an agent to take actions that lead to less frequent states which may yield higher accumulative future return. However, both knowing about the future and evaluating the frequentness of states are non-trivial tasks, especially for deep RL domains, where a state is represented by high-dimensional image frames. In this paper, we propose a novel informed exploration framework for deep RL tasks, where we build the capability for a RL agent to predict over the future transitions and evaluate the frequentness for the predicted future frames in a meaningful manner. To this end, we train a deep prediction model to generate future frames given a state-action pair, and a convolutional autoencoder model to generate deep features for conducting hashing over the seen frames. In addition, to utilize the counts derived from the seen frames to evaluate the frequentness for the predicted frames, we tackle the challenge of making the hash codes for the predicted future frames to match with their corresponding seen frames. In this way, we could derive a reliable metric for evaluating the novelty of the future direction pointed by each action, and hence inform the agent to explore the least frequent one. We use Atari 2600 games as the testing environment and demonstrate that the proposed framework achieves significant performance gain over a state-of-the-art informed exploration approach in most of the domains.
منابع مشابه
Deep Reinforcement Learning for Image Hashing
Deep hashing methods have received much attention recently, which achieve promising results by taking advantage of the strong representation power of deep networks. However, most existing deep hashing methods learn a whole set of hashing functions independently and directly, while ignore the correlation between different hashing functions that can promote the retrieval accuracy greatly. Inspire...
متن کاملAction-Conditional Video Prediction using Deep Networks in Atari Games
Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future image-frames depend on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve te...
متن کاملMeta-Reinforcement Learning of Structured Exploration Strategies
Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this ...
متن کاملA Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories: feature learning approach, low-rank approach, task clustering approach,...
متن کاملA Study of Qualitative Knowledge-Based Exploration for Continuous Deep Reinforcement Learning
As an important method to solve sequential decisionmaking problems, reinforcement learning learns the policy of tasks through the interaction with environment. But it has difficulties scaling to largescale problems. One of the reasons is the exploration and exploitation dilemma which may lead to inefficient learning. We present an approach that addresses this shortcoming by introducing qualitat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.00524 شماره
صفحات -
تاریخ انتشار 2017